89 research outputs found

    MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning

    Full text link
    Audio-visual learning helps to comprehensively understand the world by fusing practical information from multiple modalities. However, recent studies show that the imbalanced optimization of uni-modal encoders in a joint-learning model is a bottleneck to enhancing the model's performance. We further find that the up-to-date imbalance-mitigating methods fail on some audio-visual fine-grained tasks, which have a higher demand for distinguishable feature distribution. Fueled by the success of cosine loss that builds hyperspherical feature spaces and achieves lower intra-class angular variability, this paper proposes Multi-Modal Cosine loss, MMCosine. It performs a modality-wise L2L_2 normalization to features and weights towards balanced and better multi-modal fine-grained learning. We demonstrate that our method can alleviate the imbalanced optimization from the perspective of weight norm and fully exploit the discriminability of the cosine metric. Extensive experiments prove the effectiveness of our method and the versatility with advanced multi-modal fusion strategies and up-to-date imbalance-mitigating methods

    Nearest Neighbor Machine Translation is Meta-Optimizer on Output Projection Layer

    Full text link
    Nearest Neighbor Machine Translation (kkNN-MT) has achieved great success in domain adaptation tasks by integrating pre-trained Neural Machine Translation (NMT) models with domain-specific token-level retrieval. However, the reasons underlying its success have not been thoroughly investigated. In this paper, we comprehensively analyze kkNN-MT through theoretical and empirical studies. Initially, we provide new insights into the working mechanism of kkNN-MT as an efficient technique to implicitly execute gradient descent on the output projection layer of NMT, indicating that it is a specific case of model fine-tuning. Subsequently, we conduct multi-domain experiments and word-level analysis to examine the differences in performance between kkNN-MT and entire-model fine-tuning. Our findings suggest that: (1) Incorporating kkNN-MT with adapters yields comparable translation performance to fine-tuning on in-domain test sets, while achieving better performance on out-of-domain test sets; (2) Fine-tuning significantly outperforms kkNN-MT on the recall of in-domain low-frequency words, but this gap could be bridged by optimizing the context representations with additional adapter layers.Comment: Accepted by EMNLP202

    OmniDrones: An Efficient and Flexible Platform for Reinforcement Learning in Drone Control

    Full text link
    In this work, we introduce OmniDrones, an efficient and flexible platform tailored for reinforcement learning in drone control, built on Nvidia's Omniverse Isaac Sim. It employs a bottom-up design approach that allows users to easily design and experiment with various application scenarios on top of GPU-parallelized simulations. It also offers a range of benchmark tasks, presenting challenges ranging from single-drone hovering to over-actuated system tracking. In summary, we propose an open-sourced drone simulation platform, equipped with an extensive suite of tools for drone learning. It includes 4 drone models, 5 sensor modalities, 4 control modes, over 10 benchmark tasks, and a selection of widely used RL baselines. To showcase the capabilities of OmniDrones and to support future research, we also provide preliminary results on these benchmark tasks. We hope this platform will encourage further studies on applying RL to practical drone systems.Comment: Submitted to IEEE RA-

    Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication

    Full text link
    Visual storytelling aims to generate a narrative paragraph from a sequence of images automatically. Existing approaches construct text description independently for each image and roughly concatenate them as a story, which leads to the problem of generating semantically incoherent content. In this paper, we propose a new way for visual storytelling by introducing a topic description task to detect the global semantic context of an image stream. A story is then constructed with the guidance of the topic description. In order to combine the two generation tasks, we propose a multi-agent communication framework that regards the topic description generator and the story generator as two agents and learn them simultaneously via iterative updating mechanism. We validate our approach on VIST dataset, where quantitative results, ablations, and human evaluation demonstrate our method's good ability in generating stories with higher quality compared to state-of-the-art methods.Comment: Accepted to COLING 202

    A Benchmark of Video-Based Clothes-Changing Person Re-Identification

    Full text link
    Person re-identification (Re-ID) is a classical computer vision task and has achieved great progress so far. Recently, long-term Re-ID with clothes-changing has attracted increasing attention. However, existing methods mainly focus on image-based setting, where richer temporal information is overlooked. In this paper, we focus on the relatively new yet practical problem of clothes-changing video-based person re-identification (CCVReID), which is less studied. We systematically study this problem by simultaneously considering the challenge of the clothes inconsistency issue and the temporal information contained in the video sequence for the person Re-ID problem. Based on this, we develop a two-branch confidence-aware re-ranking framework for handling the CCVReID problem. The proposed framework integrates two branches that consider both the classical appearance features and cloth-free gait features through a confidence-guided re-ranking strategy. This method provides the baseline method for further studies. Also, we build two new benchmark datasets for CCVReID problem, including a large-scale synthetic video dataset and a real-world one, both containing human sequences with various clothing changes. We will release the benchmark and code in this work to the public

    Gains and losses from collusion: an empirical study on market behaviors of China’s power enterprises

    Get PDF
    Purpose: Collusion is a common behavior of oligarch enterprises aiming to get an advantage in market competition. The purpose of the research is to explore positive or negative effects from the electricity generation manufacturers’ collusion through statistical analysis approach. To be exact, these effects are discovered both in market economy at a macro-economic level and in enterprise behaviors at a micro-economic level. Design/methodology/approach: This research designs a model as an extension of Porter’s model (Green & Porter, 1984). In this model FIML is applied. Taking price bidding project launched in China’s power industry as an example, this paper conducts an empirical research on its relevant price data collected from subordinate power plants of China’s five power generation groups in the pilots. Findings: It is found in this paper that power generation enterprises are facing collusion issues in the market. To be exact, it is such a situation in which non-cooperative competition and collusion alternate. Under the competition, market is relatively steady, thus forming a lower network price. It is helpful to the development of the whole industry. However, once Cartel is formed, the price will rise and clash with power enterprises and transmission-distribution companies concerning the interests conflicts. At the same time, a higher power price will form in the market, making consumers suffer losses. All of these are bad for industry development. Not only the collusion of power enterprises affects power price but also the market power that caused by long-time Cartel will reduce the market entrant in electricity generation. Market resources are centralized in the hands of Cartel, causing a low effective competition in the market, which has passive effects on users. Implications: The empirical research also indicates that collusion undoubtedly benefits the power enterprises that involved. As a cooperation pattern, collusion can lead to the synergy between relevant companies. However, collusion harms the benefits of other market entities. During the process of enterprises creating common interests cooperatively, collusion may bring harm to the outside industry. Originality/value: Using empirical research method, the paper takes China’s power industry as an example to show the gains and losses of collusion from two aspects, namely market economy and strategic management.Peer Reviewe
    • …
    corecore